6.824 Lab2

6.824: Distributed Systems Engineering

Lab2: Raft

使用的是2018 Spring的raft lab版本

摘要：本文简单写一个完成6.824 lab 2的一些感受，并简单说一下实现raft的难点和需要注意的地方。部分地方会参考Student's Guide to Raft，大部分资料来自Raft Paper，一个非常好的可视化教程，一个官方的Raft Visualization。

感受

这个Lab比之前的mapreduce lab 难了很多，工作量也增加了很多，尽管代码总量不是非常多，但是调试工作要进行非常多的时间，甚至占整个lab 80%以上的时间。我第一次尝试时，花了两天完成了Test 2A，然后一天半花在了Test 2B，但是Test 2B还是有一个小的部分没过，加上第一次写的时候代码非常丑陋，对Golang也不是很熟悉，于是先放了一放。

实验指导这么说

Your first implementation may not be clean enough that you can easily reason about its correctness. Give yourself enough time to rewrite your implementation so that you can easily reason about its correctness. Subsequent labs will build on this lab, so it is important to do a good job on your implementation.

所以在一周后进行了重写，也就是现在看到的结果。

重写相对比较顺利，一天时间完成了Test 2A，第二天完成了Test 2B、Test 2C，代码结构也变得更为清晰，不过仍然是将近80%的时间花在了调试上。在写这篇文章前又重读了Student's Guide to Raft，发现确实很多坑都踩上了，所以在实现时还是要好好读这篇文章。

实验指导书要求最终三个Test总时间不能超过4分钟，我的实现最终花了2分多钟跑完了。

$ go test -run 2
Test (2A): initial election ...
labgob warning: Decoding into a non-default variable/field Index may not work
  ... Passed --   3.0  3  118    0
Test (2A): election after network failure ...
  ... Passed --   4.6  3  256    0
Test (2B): basic agreement ...
  ... Passed --   0.5  5   32    3
Test (2B): agreement despite follower disconnection ...
  ... Passed --   5.8  3  240    8
Test (2B): no agreement if too many followers disconnect ...
  ... Passed --   3.4  5  380    3
Test (2B): concurrent Start()s ...
  ... Passed --   0.7  3   22    6
Test (2B): rejoin of partitioned leader ...
  ... Passed --   5.9  3  350    4
Test (2B): leader backs up quickly over incorrect follower logs ...
  ... Passed --  15.4  5 2627  102
Test (2B): RPC counts aren't too high ...
  ... Passed --   2.2  3   84   12
Test (2C): basic persistence ...
  ... Passed --   3.4  3  130    6
Test (2C): more persistence ...
  ... Passed --  16.9  5 1840   17
Test (2C): partitioned leader and one follower crash, leader restarts ...
  ... Passed --   2.9  3   68    4
Test (2C): Figure 8 ...
  ... Passed --  31.8  5 1719   15
Test (2C): unreliable agreement ...
  ... Passed --   5.5  5  424  246
Test (2C): Figure 8 (unreliable) ...
  ... Passed --  28.7  5 5300   42
Test (2C): churn ...
  ... Passed --  16.4  5 2140  155
Test (2C): unreliable churn ...
  ... Passed --  16.4  5 1863  215
PASS
ok      raft    164.123s

难点和坑点

From THE GUILD

In fact, Figure 2 is extremely precise, and every single statement it makes should be treated, in specification terms, as MUST, not as SHOULD. For example, you might reasonably reset a peer’s election timer whenever you receive an AppendEntries or RequestVote RPC, as both indicate that some other peer either thinks it’s the leader, or is trying to become the leader. Intuitively, this means that we shouldn’t be interfering. However, if you read Figure 2 carefully, it says:

虽然说论文中Figure 2讲的比较详细，但是仍然有几点不足。第一，没有涵盖所有细节，比如何定义log 是不是 up-to-date等等。第二，并不是所有的要求都集中在一个Section中，比如在实现RequestVote RPC或者 AppendEntries RPC时，你不仅仅要考虑这两个RPC的要求，还要考虑发送者或接受者作为"follwer, candidate, leader"的区别，每一种属性都有不同的要求，同时对于三种身份还有共同的要求。第三，图2并没有讲清楚所有的数据结构，具体实现时候需要增加很多属性，决定某些属性的类型。

If commitIndex > lastApplied at any point during execution, you should apply a particular log entry. It is not crucial that you do it straight away (for example, in the AppendEntries RPC handler), but it is important that you ensure that this application is only done by one entity. Specifically, you will need to either have a dedicated “applier”, or to lock around these applies, so that some other routine doesn’t also detect that entries need to be applied and also tries to apply.

论文里并没有谈到如何去apply log，事实上在实现时需要在Go中新开一个线程来提交log，这个往往会在刚开始时造成困难

If a leader sends out an AppendEntries RPC, and it is rejected, but not because of log inconsistency (this can only happen if our term has passed), then you should immediately step down, and not update nextIndex. If you do, you could race with the resetting of nextIndex if you are re-elected immediately.

这也是我后期发现的一个小问题，AE RPC返回False并不一定是log不match，还有可能是这轮已经结束了，所以不要看到false就nextIndex -= 1。

A leader is not allowed to update commitIndex to somewhere in a previous term (or, for that matter, a future term). Thus, as the rule says, you specifically need to check that log[N].term == currentTerm. This is because Raft leaders cannot be sure an entry is actually committed (and will not ever be changed in the future) if it’s not from their current term. This is illustrated by Figure 8 in the paper.

这个也是后期发现的一个大问题，论文图2有提但是非常容易被忽视，论文在P8-P9页说明了这个细节：为了避免出现图 8 那样的情况，leader只允许commit当前轮次的log

From Practice

你第一次实现的代码，必然是靠想象的多，大部分是没有严格按照论文来的，就算长时间调试不出来也是非常正常的。在后续的每一次重写/修改后，你的代码会渐渐符合论文中描述的样子，但是仍然会有小的问题偶尔出现。最终应该是和论文中每一个小细节都注意到了，代码也比第一次完成的代码都了一倍，因为考虑了很多小细节，加了很多调试代码，好在util.go中的DPrintf非常好用。

另外，熟悉Golang非常重要，这份Go tour非常好用，稍微会点C/C++读完基本就达到可以实现Lab 2的地步了，不过如果你已经做过Lab 1- MapReduce，那么你应该已经对Go比较熟悉了。

时间

写于2019年11月4日